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(54) Adaptive video coding method 

(57) In accordance with the present invention, a 
method is descn'bed which achieves a constant bit rate 
output when coding multiple video objects. This imple- 
mentation malces use of a quadratic rate<listortion 
model. Each ob>ject is described by its own set of 
parameters. With these parameters, an initial target bit 
estimate is made for each object after a first frame is 
encoded. Based on output buffer fullness, the total tar- 
get is adjusted and then distributed proportional to a 
parameter set representative of the activity of the 
objects in the frame. Activity is determined by reference 
to weighted ratios derived from motion, size and vari- 
ance parameters associated with each object. A shape 
rate control parameter is also invoked. Based on the 
new individual targets and second order model parame- 
ters, appropriate quantization parameters can be calcu- 
lated for each video object. This method assures that 
the target bit rate is achieved for low latency video cod- 
ing. 

In order to provide a suitable bit rate control system 
based on a quadratic rate-distortion model, it has been 
found that control information may be applied jointiy with 
respect to video objects (VO*s). rather than entire 
frames. 
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Description 

RELATED APPLICATION 

This application is a continuation-in-part of Application Serial No. 08/800,880, ffled February 14, 1997, in the 
names of Hui-Fang Sun and Anthony Vetro. entitled "Adaptive Video Coding Method", which application is assigned Id 
the same assignee as the present application. 

BACKGROUND OF THE INVENTION 

FILED OF THE INVENTION 

This invention relates to methods of coding video signals for digital storage and/or transmission of such signals 
using joint rate control for multiple video objects based on a quadratic rate<listortion model. 

More particularly, this Invention relates to a method of encoding video employing a joint rate control algorithm for 
multiple video object coding. The algorithm is based on the VM7 rate control scheme as described in the MPEG-4 
Video Verification Model V7.0 ISO/ICC JTCI/SC29/WQ1 1 , Coding of Moving Picture and Associated Audio MPEG 
97/N1642. April 1997, Bristol, U.K. 

The method follows a similar framework as that proposed previously by the current inventors in their parent appli- 
cation, with a change in the method of target distribution and introduction of a tool to take into account object shape In 
the rate control process. These modifications oontritxite to more homogeneous quality among video objects and better 
buffer regulation. As a whole, the method provides an effective means of coding multiple video objects so that the buffer 
is well-regulated and bits are appropriately disbibuted; yet it is flexible in deciding the necessary compromise between 
spatial and tenrporal quality. 

DESCRIPTION OF THE PRIOR ART 

A basic method for compressing the bandwidth of digital color video signals which has been adopted by the Motion 
Picture Experts Group (MPEG) utilizes Discrete Cosine Transform (DCT) techniques. In addition, the MPEG approach 
employs motion compensation techniques. 

The MPEG standard achieves high data compression rates by developing information for a full frame of the image 
only every so often. The full Image frames, or intra-ooded pictures are called "l-frames". and contain the full frame infor- 
mation independent of any other frames. Between the l-frames, there are so-called B-frames and P-fiames which store 
only image differences which occur relative to reference anchor frames. 

More specifically, each frame of video sequence is partitioned into smaller blocks of pixel data and each block is 
subjected to the discrete cosine transformation function to convert the statistically dependent spatial domain picture 
elements (pixels) into independent frequency domain DCT coefficients. 

That is, the blocks of data, encoded according to intraframe coding (l-frames), consist of matrices of Disaete 
Cosine Coefficients. Respective 8x8 or 16x16 blocks of pixels are subjected to a Discrete Cosine Transform (DCT) 
to provide a coded signal. The coefficients are subjected to adaptive quantization, and then are run-length and variable- 
length encoded. Hence, respective blocks of transmitted data may include fewer than an 8 x 8 matrix of codewords. 
Macroblocks of intraframe encoded data will include, in addition to the DCT coefficients, information such as the level 
of quantization employed, a macroWock address or location indicator, and a macrobk)ck type, the latter infbrmatton 
being referred to as "header" or "overhead" information. 

Blocks of data encoded according to P or B interframe coding also consist off matrices of Discrete Cosine Coeffi- 
cients. In this instance however, the coefficients represent residues or differences between a predicted 8x8 pixel matrix 
and the actual 8x8 pixel matrix. These coefficients are subjected to quantization and run- and variable-length coding. 
In the frame sequence. I and P frames are designated anchor frames. Each P frame is predicted from the lastmost 
occurring anchor frame. Each B frame is predicted from one or both of tiie anchor frames between which it is disposed. 
Ihe predictive coding process involves generating displacement vectors which indicate which block of an anchor frame 
most closely matches the block of the predicted frame currentiy being coded. The pixel data of the matched block in the 
anchor frame is subtracted, on a pixel-by-pixel basis, from the block of the frame being encoded. Id develop the resi- 
due& The transformed residues and the vectors comprise the coded data for the predictive frames. As with intraframe 
coded frames, the macroblocks include quantization, address and type information. 

The resuHs are usually energy concentrated so tfiat a few of the coefficients in a block contain tiie main part of the 
picture information. The coefficients are quantized in a known manner to effectively limit tiie dynamic range of the coef- 
ficients and the results are then run-length and variable-length encoded for application to a transmission medium. 

In a recent proposal for implementing the latest coding verification niKxJel (VM), whtoh is described in "MPEQ-4 
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Video Veriffcation Model Version 5.0", distributed by Adhoc group on MPEG-4 video VM editing to its members under 
the designation ISO/IEC JTC1/SC29/WG1 1 MPEG 96/N1469. November 1996. the contents of which are incorporated 
herein by reference, representatives of the David Sarnoff Research Center proposed "A New Rate Control Scheme 
Using Quadratic Rate Distortion ModeT. The MPEG-4 video coding format will produce a variable bit rate stream at the 

5 encoder from frame to frame (as was the case with prior schemes). Since the variable bit rate stream is to be transmit- 
ted over a fixed rate channel, a channel buffer is employed to 8nrKX>th out the bit stream. In order to prevent the buffer 
from overflowing or underf lowing, rate control of the encoding process is required. 

In the recent Sarnoff proposal, before the encoding process l^egins for a given set of frames (picture), a target bit 
rate for each frame is calculated to accommodate the fact the output bit rate from the output of the encoder Is con- 

10 strained to a fixed bit rate while the bit rate resulting from picture encoding can vary over a relatively wide range (if left 
uncorrected), depending on the content of the Image frame. According to the proposal, the distortion measure associ- 
ated with each frame is assumed to be the average quantization scats of the frame and the rate distortion function is 
modeled as a second order function of the inverse of the distortion measure. Before the actual encoding process begins 
the target bit rate of the image is estimated by the nuniber of bits left for coding the group of images, as well as the 

IS nun*er of frames still to be encoded. The authors mention implementing their scheme at the picture level and also note 
a possibility for extending their scheme to the macrobiock level. 

It has also been known that when a Wock (macroblod^ contains an edge boundary of an object, the energy in that 
block after transformation, as represented by the DCT coefficients, includes a relatively large DC coefficient (top left cor- 
ner of matrix) and randomly distributed AC coefficients throughout the matrix. A non-edge block, on the other hand, 

20 usually is characterized by a similar targe DC coefficient (top left comer) and a few (e.g. two) adjacent AC coeffidents 
which are siiDstantially larger than other coeffidents assodated with that block. This information relates to Image 
changes in the spatial domain and, when combined with image difference information obtained from comparing succes- 
sive frames (i.e. temporal differences) factors are available for distinguishing one video object (VO) from another. 
As shown in Rgure 1 (a sample video scene), one or more video objects (VO^, VO2. VOj) may be contained in an 

25 image frame or plane (VOP) and. in each successive frame, the relative positioning of video objects may be expected 
to change, denoting motion. At the same time, ttiis motion assists in defining the objects. 

Under the MPEG-4 VM. additional objectives of content-based manipulation and independent bit stream coding 
have been imposed to provkie added functionality at tiie decoder end of the system. The MPEG-4 objective compli- 
cates and imposes additional processing requirements on the process of predicting target bit rates for each frame as a 

30 result of the added overhead information such as the coding of shape information within the MPEG-4 encoder. The fore- 
going characteristics of tiie MPEG-4 VM. as well as information regarding identification of individual VO*s is explained 
in greater detail in the atxive-referenced manual. 

It is an object of the present invention to provide an adaptive video coding metiiod which is particularly suitable for 
MPEG-4 encoder and other encoding schemes. 

35 It is a further object of tiie present invention to provide an adaptive video coding method for use in accordance witii 
MPEG-4 VM wherein individual video objects (VO's) are taken into account in provkiing an improved bit rate control sys- 
tem making use of relative motion, size, variance and shape of each VO. 

SUMMARY OF THE INVENTION 



In accordance with the present invention, a method is described which achieves a constant bit rate output when 
coding multiple vMeo objects. This implementation makes use of a quadratic rate-distortion model. Each object is 
desaibed by its own set of parameters. Witii these parameters, an initial target bit estimate is made for each object after 
a first frame is encoded. Based on output buffer fullness, the total target is adjusted and then distritnited proportional to 
45 a parameter set representative of the activity of the objects in the frame. Activity is determined by reference to weigtited 
ratios derived from motion, size and variance parameters associated with each object A shape rate control parameter 
is also invoked. Based on tiie new indlvkjual targets and second order model parameters, appropriate quantization 
parameters can be calculated for each vkJeo object This metiiod assures that the target bit rate is achieved for low 
latency video coding. 

so In order to provide a suitable bit rate control system based on a quadratic rate-distortion model, it has been found 
that contrd information may be applied jointiy with respect to video objects (VO's), rather than entire frames. 
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In the drawing: 



Figure 1 



is a schematic pictorial representation of three successive image frames having two video objects 
(VO1 and VO2) and a background image, where each of the VO's moves from left to right in the 
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scene over time: 

Rgure 2 is a block diagram illustrating steps in the method according to the invention of our parent applica- 

tion, along with the interrelationships among such steps; 

5 

Figure 3 is a block diagram illustrating steps in the method according to the present invention, along with the 

interrelationships among such steps; 

Rgure 4 is a block diagram of an MPEG-4 encoder which has been nrxxlif ied to implement the inventions of 

10 our parent application and/br the present invention; 

Rgure 5 is a diagram of the parameter "AlphaTH" as a function of time which illustrates a time sequence of 

shape rate control decisions based on nrKxie of operation ("H" or "L") according to the present 
invention; and 

15 

Rgures 6 bis 16 are a set of plots of buffer occupancy (bits) percentage versus frame for a series of video signal 
sequences representative of particular named images encoded with low and high bit rates in 
accordance with the present invention as identified in Table 2 and Table 4 below. 

20 DETAILED DESCRIPTION 

As is set forth in our earlier-filed parent U.S. Patent Application refen^ed to above, a method for performing joint bit 
rate control can be broken down into a pre-encoding stage and a post-encoding stage. 

As shown In Figure 2, a pre-encoding stage 20 comprises (0 target bit estimation 21 . (ii) joint buffer control 22. (iii) 

25 a pre-frameskip control 24, and (iv) a quantization level calculation 25. The post-encoding stage 27 comprises (i) updat- 
ing the rate-distortion model 28. and (ii) a post framesWp control 29. An important aspect of this scheme, not evident 
from the block structure, is that most blocks require previous operations to be complete for every video object (VO). For 
instance, inter-cocfing 31 of the next VO to be coded will not begin until all quantization levels for preceding VO*s have 
been calculated. In this embocBment. all the VO*s are coded at the same frame rate. However, many of the aspects of 

30 the current implementation anticipate a migration towards different frame rate for each VO. However, a more complex 
buffer control will be required. 

In a preferred emtxxiiment of our parent application, an adaptive video encoder (Rgure 4) is arranged to follow the 
method illustrated in Figure 2. A Digital Image Source 10 provides image information on a frame basis or on a Video 
Object (VO) basis to a video signal encoder 12 digitized form. The image is partitioned into spatially non-overlapping 

35 blocks of pixel data. The block size of 8 x 8 pixels or 16 x 16 pixels may be employed. Each partitioned trfock is then 
pr o cessed. 

A motion estimator 14 is used to estimate a motion vector for the Input block with reference to a temporally dose 
reference frame stored in a frame memory (previously) reconstructed VOP 16). The reference frame may be an original 
unprocessed frame or a previously coded frame. Bi-directional motion estimation such as that described in the MPEG 
40 standaftis can also be applied. 

A motion compensation block 1 1. a texture coding block 13. a shape coding block 15. a constant output bit rate 
buffer 17 and an MSDL multiplexer 19. all arranged as described in the MPEQ-4 reference document, are provided. In 
addition, a rate control system 18 (as described in connection with Rgure 2) is provided to perform added functions 
according to the Invention of our parent application. 
45 Refen^ing again to Rgure 2. the pre-encoding stage 20 further includes In'rtianzation 26 (see Table I below). 

A. Initializatk>n 26 

In this section, most rate control variables (e.g., first and second order complexities and MAD or Mean Absolute 
50 Difference information) have been extended to vectors so that each VO can carry its own separate information. Among 
those that have not been changed are the remaining number of bits for the segment and the rate of buffer drain. Table 
1 summarizes the notations used to describe the method. 
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5 Table 1 Notation used for joint rate control based on 

quadratic rate-distortion model. 





VARIABLES 


DESCRIPTION 


10 


Buf f_drain 


Number of bits to be removed 
from the buffer per picture 


IS 


MAD[i1 


Mean absolute difference for 
current VO after motion 
compensat ion 


20 


Xl[i], X2[i] 


First and second order 
complexity measures 


OK 


Q[il 


Quantization parameter for ith 
VO 




N_skip_post 


Number of frames to skip 
accord posu^rramesKxp 


30 


N_^skipjpre 


Number ox rrames x.o SKxp 
according to pre-f rameskip 


35 




Total number of frames to be 
skipped 


40 


N_btwn 


Number of frames between 
encoded frames 


45 


B_left 


Number of bits left for coding 
the sequence 


T_textur e [ i ] 


Texture bit count for ith VO 


50 


T_texture 


Total texture bit count (all 
VOs) 
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5 


Tti] 


Bit count for ith VO including 
texture, shape, motion and 
header bits 


10 


T 


Total bit count including 
texture, shape, motion and 
header bits (all VOs) 


IS 


H[i] 


Header bit count including | 

shape and motion H 




H 


Total header bit count (all fl 
VOs) 


20 


Buff_size 


Size of buffer 




Buf f_level 


Current fullness of buffer 



25 



B. Post-Encoding Stage 27 

30 

After the encoding stage 30, the parameters Ibr the rate-distortion model must be sought. For multiple-VO. the 
encoder rate-distortion function is modeled as: 



40 

Rom the above equation, the model parameters. XI j and Xa,. can be calculated separately for every VO. In the 
above equation, the target value, T_texture, is decomposed into multiple TJexturei, which con-esponds to the amount 
of bits used for coding the texture component only of the ith VO. 

Referring to ngure 2. the next step in the post encoding stage 27 is the post-frame skip control function 29. At this 
4S point the buffer 1 7 has been updated. Overflow is prevented by checking the current buffer level against a skip margin, 
y. If the current buffer level is above the designated margin, frames are continually skipped, i.e.. N_skip_post is incre- 
mented, until a specific criteria is met. 

In accordance with one emtxxiiment of our parent application, this post-frame skip control is incremented until the 

criteria: 

50 BufLlevet - N_skip_jx)St* BufLdrain <(1^)* Buff^size 
is met. 

In a prefen-ed an'angement for our parent application, y is chosen to equal 0.2. After the condition of the equation 
above has been satisfied and N_skip_post has been found, the value of N_skip_j>re is added to it. The detemiination 
of N_skip_pre will be discussed shortly. The final value. N_skip= N_skip_pre + N_skip_post, is equal to the total 
55 frames to be skipped. It is this value which determines the new time instant Note that the time instant can only be 
updated after the post-frameskip control function occurs. 

Proceeding with the next pre-encoding stage of the first anrangement. the initial target bit rate is calculated based 
on the number erf available bits for the segment and the number off bits used in the previous corresponding VO. A similar 
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lower bound to the frametesed simulation is used so that ntirumum quality is guaranteed. 

T[i]=Max^B_left/(30 nuinVOs) , 
B_left/{nuinV01eft[i] numVOs) } 

T[i] = T[i] . (i-a) + B_past[i] • a 



The weighting a represents a contribution from the past frame and is set to 0.2 in that implementation. 

Once the initial target has been set, adjustments based on the k^uffer 18 fullness are made according to, 
15 T = T • (2a + b)/(b + 2a) , where a = Buffjevel arri b = Buff_size - buffjevel. Note that this target rate represents 
the sum of all individual target rates. Further changes are made based on the expected effect of the target with respect 
to the current and future buffer 18 level. Denoting a safety margin by 5, we increase the target by, 

71/nc = BufLdrain - T - Buffjevel + 6 • Buffjsize 

20 

if 

Buffjevel - Buff_drain + 7 > 6 • Buffjsize 
25 On the other hand, we decrease the target by, 

TLrfec = Buffjevel + r - (1-6) • Buffjsize 

if 

30 

Buffjevel + T > ( 7 -5) • Buffjsize. 

The operations described above are part of the joint buffer control. In the illustrated implementation. 5 is set to 0.1 . 
The next step is to redstribute the bits so tfiat consistent quality is maintained across different objects. To acNeve 

35 this, the size of the object and the amount of activity which it is experierrcing are obtained from tiie header information 
of the previously coded objects. However, before distributing tiie target, a check Is made to determine if the amount of 
bits used for the header of the previous frame exceed this bit count. The difference, s = T - H , denotes an approxima- 
tion to the number of bits available for coding the texture of every VO. If s < 0, then tiiere may not be enough bits to 
uniformly code each VO. In this case, all targets are made negative. As is explained later, this forces lower tx)und con- 

40 straints on the quantization parameter, thereby limiting the amount of bits spent on the texture. Additionally, if s < 0. the 
pre-frameskip control 24 is invoked. Since the time instant is only updated after the post-encoding stage 27, this frame- 
skip control block serves as a confection towards tiie next time instant update. When invoked, a non-zero value of 
N_skipj:}re will be determined. This value is determined according to: 

45 while (s < O) 

^ increment N_skip-pre 
s s + Buffjirain\ 



This combination of making tiie targets negative and skipping extra frames will allow the rate control algoritiim to 
better estinrmte the next target while providing uniform object quality. 
55 In tiie event that s > O. ttie total target, 7, \s distributed proportional to the header information of the previously 
coded frame as: 
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10 



IS 



Tj = Hi . (1 

H 



Having a target for each VO. the next task is to determine individual distortion measures which correspond to the 
desired rate. Treating the process separately for each VO and normalizing with respect to the MAD, leaves us to solve 
the classic quadratic: 

ajic^ + Jbx + c = O, where, 
a = X2i 

b = XI; 



c = T^texture^/MADi 



20 Keeping in nrtind that TJtexturedi is a result of subtracting the header bits from the total bits, it is possible to obtain 
small targets when performing lew-bit-rate coding. To overcome this difficulty, we lower bound the target according to: 

r_tejfture,. « Max \Buff_drain )^ 
25 , T^texturB^ 



30 In the event that the target was negative, the derived quantization parameter is lower bounded by LB_QUANT. oth- 
erwise the usual clipping between 1 and 31 is employed. The use of this parameter ensures that a relatively small 
amount of bits will go to coding the texture. The value of LB_QUANT should be chosen to be greater than 25. As an 
alternative, we may decrease the amount of bits spent on shape coding by reducing the resolution of the alpha-plane 
before coding. 

3S 

EXPERIMENTAL RESULTS 

The table below summarizes the testing groups for the algorithm described in the preceding section. An initial 
quantization parameter of 15 was chosen for the l-frame. but thereafter the quantization parameter was automatically 
40 determined. 
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TABLE 2 



Table 2 Testing groups for coding multiple video 
objects. 



10 


ID 


Sec[uences 


Bit Rate 


Frame Rate 


Format 






(kbps) 


(H2) 






1 


Akiyo, 


10 


7.5 


QCIF 


15 




Container 










2 


Akiyo, 
Container 


24 


10 


QCIF 


20 


3 


News 


48 


7.5 


CIF 


25 


4 


Coastguard 


48 


10 


QCIF 












30 


1 ^ 


Coastguard 


112 


15 


CIF 



35 



In the coding of multiple VOs. ttiree parameters are coded: shape, motion and texture. The encoder software allows 
the user to code the parameters in a combined mode or a separate mode; the simulation results presented here employ 
separate parameter encoding. Table 3 provides details of the PSNR for each VO (Y-component only) and also reports 
40 the actual bit rate achieved. 
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10 



IS 



20 



25 



30 



35 



40 



45 



50 



4^ 
C 
Q) 



O 

<M 
O 

CO 

rH 

(0 
Q) 

C 

o 

ID 
iH 

i 

CO 

n 
Q) 



Actual 
Rate 


10.03 


06-6 


23.90 


23.78 1 


47.08 


47.82 


111.94 1 


Y-psnr 
V05 


N/A 


18.87 


N/A 


18.62 


N/A 


N/A 


N/A 


Y-psnr 
V04 


N/A 


29.60 


N/A 


29.08 


N/A 


N/A 


N/A 


Y-psnr 
V03 


N/A 


25.56 


N/A 


25.37 


24.68 


26.40 


26.64 


u 


V/N 


19.98 


N/A 


19.75 


27.83 


25.96 


24.24 


y-psnr 
vol 


27.23 


23.76 


29.82 


22.73 


28.50 


23.15 


22.74 


Y-psnr 
VOO 


35.79 


31.96 


40.42 


31,54 


35.87 


29.09 


27.13 


For- 
mat 


QCIF 


QCIF 


QCIF 


QCIF 


CIF 


QCIF 


CIF 


Frame 
Rate 


. 


in 


o 


o 


in 


o 

H 


in 

rH 


Target 
Rate 


o 
1-1 


o 


-a- 

fM 




CO 

■* 


CO 


rsj 

rH 
rH 


Sequence 


Akiyo 


Container 


Akiyo 


Container 


News 


Coastguard 


j Coastguard 
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From these results it is evident that the parent scheme is capable of achieving the target bit rate with satisfactory 
image quality. However, for low-latency applications we must also prevent the buffer from overflowing. For coding mul- 
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tiple video objects, there is a great deal of overhead information that must be specified. Often, this will leave the encoder 
with very few bits for coding the texture in each object As a result, the encoder is forced to skp fram^ and/or increase 
the quantization level to maintain a suitable buffer level. Plots illustrating the buffer ooaipancy are provided in Rgures 
3 - 9 of our parent application. Additionally, the nunri>er of coded frames are specified. For each sequence. 300 frames 
5 (10 sec) were coded. 

In our parent application we presented a means of encoding multiple video objects in a scene based on a quadratic 
rate-distortion model. The scheme fe an enhancement of methods already proven for frame-based encoding simula- 
tions. A frame skip control is invoked to assist the Ixjffer from becoming too full. Instances in which the buffer does over- 
flow are irKfications of an unusually large amount of bits being spent on overhead. To prevent this, reductions can be 
10 made in the amount of overhead infornr^tion (e.g.. bits spent on shape). 

PREFERRED EMBODIMENT 

in accordance with tiie present invention, the fundamental approach set forth in our parent application, along with 

IS modifications regarding the target distribution 32, mode of operation 33. shape-related rate control 25* and post-frame- 
skip 29* as shown in Rgure 3 are enployed. Two modes of operation are enployed. The target distribution is based on 
the size, motion and variance (or MAD) of each object. The weights for each contribution depend on a mode of opera- 
tion. A first mode is directed to low bit-rates (LowMode) and a second mode is directed to high bit-rates (HighMode). 
The new target distritxition produces significant inrprovements in subjective quality. Modifications to the frameskip con- 

20 trol prevent overffow of tiie register. 

The method for performing joint rate control, as was the case in our parent application, can be broken irrto a pre- 
encoding stage and a post-encoding stage. As set forth above, the pre-encoding stage 20 comprises: 0) target bit esti- 
mation 21. GO joint buffer control 22. (iii) a pre-frameskip control 24. and fiv) a quantization level calculation 25*. The 
target bit estimation 21 is also associated witii a modified target distribution function 32 as will be explained below. The 

25 quantization level calculation 25' is also associated with a shape rate-control function as will be explained. The post- 
encoding stage 27 comprises: (i) updating the rate-distortion model 28; Q\) a post-frameskip control 29* and a mode of 
operation function 33. Rg. 3 illustrates the present rate control process and includes additional features associated witii 
the present invention which include the added target distribution 32, mode of operation 33. shape related rate control 
25' and modified post-frameskip control 29*. 

30 In the arrangement in our parent application, a target was sought for every object in the scene and all video objects 
were coded at the same frame rate. The total bits for one frame were distributed proportional to the amount of header 
bits in the previous corresponding object. In the present case, the bits are distributed proportional to a function which 
takes into account tiie relative motion, size and variance or "MAD" of each object The MAD (Mean Absolute Difference) 
assodaled with each particular vKleo object in each VOP (which is motion compensated) has been determined to be a 

35 suitable measure of variance for purposes of rate control. In a preferred arrangement tiie MAD[i] factor is selected to 
be MAD^i]. For a total target T, tiie amount of bits for every VO can be expressed as: 

T [ i ] =w^*MOT [ i 1 +w.* S 1 2 E [ i ] +w^*MAD- [ i ] , 

40 

where MOT[i], SIZE[i] and MAD^Q denote the relative ratios of tiie motion, size, and mean absolute difference param- 
eters, respectively, and w^. w^ and Wy are weights which satisfy tiie equation: 

w^+w.+Wy = 1 



50 MODES OF OPERATION 

The post encoding function 27 includes mode of operation function 33. Specifically, two different modes of opera- 
tion: one for encoding at low-bit rates and another for encoding at high bit-rates are provided. 

When encoding at high bit rates, tiie availability of bits allows the process to be flexible in its target assignment to 
55 each VO. Under these circumstances, it is reasonatde to impose homogeneous quality to each VO. Therefore, the inclu- 
sion of the MAD parameter is important to the target distribution and should carry the highest weighting. On the other 
hand, when the availability of bits is limited, it is very difficult to achieve homogeneous quality among the various VO's. 
Also, under low bit-rate constraints, It is desirable to spend less bits on the background and more bits on the foreground. 
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In that case, the signification of the MAD parameter Is deaeased and the significance of the motion parameter is 
increased. Based on the above arguments and experimental trial-and-enor. the preferred weights are: w„,=0 6 w^=0 4 
Wv=0,0forLowModeandWn,=0.25. Ws=0.25, Wv=0.5forHighMode. • s • . 

Besides regulation the quality within each frame, It Is also important to regulate the temporal quality as well, i.e.. 
keep the frame skipping to a minimum. In HighMode. this is very easy to do since the availability of bits is plentiful. How- 
ever, in LowMode. frame skipping occurs much more often. In fact the number of frames being skipped is a good indi- 
cation of in which mode the process should be operating. This is expressed as follows: 



if ( total_frames_skipped>SKllPjril) 
Operate in LowMode 

else 

Operate in HighMode 



In the current implementation, the skip threshold (SKIP^TH) was set to 2. 

The decision process to obtain a mode of operation can also be seen as a constraint on the terrporal resolution. If 
the system is in LowMode. the encoder has skipped some specified number of frames. To obtain a reasonable compro- 
mise between the spatial and temporal quality. LowMode will Impose a lower bound on the calculated quantization 
parameter. This lower bound, LB.QUANT. preferably Is the same as that used In our previous applicatk>n when the tar- 
get from the joint buffer control was less than the amount of header bits used in the last frame. 

The modified function of the post-framesWp control 29' is to determine the cunent buffer 17 occupancy and ensure 
that encoding of future video objects will not cause the buffer 17 to overflow. In the previous implementation, this eval- 
uation was based only on the current buffer 1 7 level. A positive value of N_skp_post was determined to satisfy the ftrf- 
lowing condition: 

Buffjevel - N_skip_post* BufCdrain < ( /-r) • BufLsize 

In the current embodiment infbnnation from the previous frame is utilized to obtain a better expectation of the amount 
of bits which may be required to be transmitted. The new conditton is as follows: 
Buffjevel-^ BJast - (N_skip_posU1)* Buff drain <(1-y) • Buff_size, 

where BJ^t denotes the total number of bits spent encoding the previous frame or set of video objects. In this way 
buffer 1 7 will readily accept the same amount of bits which were spent in the previous time coding instant Any excess 
bits should be absoibed into the safety margin, subsequently preventing overflow from occumng. As before, the gamma 
parameter, or skip margin is chosen to be 0.2. 

SHAPE RELATED RATE CQMTRni , 

The binary shape Infbmiation (or binary alpha plane) which defines a particular object is simply a mask which sets 
a P'xe' value up to "255" If It Is part of the object or sets a pixel value to "0" If it is outside the object. According to version 
7.0 of the MPEG-4 video verification model, rate control and rate reduction of the shape information can be achieved 
through size conversion of the alpha plane. The possible conversion ratios (CR) are 1 . 1/2, or 1/4. In other words a 16 
XI 6 macroWock (MB) may be down-converted toan8x8ora4x4 block Each macroblock containing relative shape 
information for the object can be down-converted for coding, then reconstructed at the original size. A conversion error 
IS calculated for every 4 x 4 pixel block (PB). The conversion enor is defined as the sum of absolute differences between 
the yf9\uB of a pixel in the original PB and the reconstructed PB. If the conversion enor is larger than sixteen times 
Alpha Threshold" (i.e.. 16 x AlphaTH), then this PB Is refened to as an "Error PB". If there is one "Error PB" in a mac- 
roblock. then the conversion ratio (CR) for the macroblock is increased, with the maximum value being 1. 

From the above discussion of shape coding. It is evident that the value of AlphaTH has considerable effect on the 
number of bits which will be spent on shape information for each VO. A method is described according to the present 
invention, for controlling the shape infomialion based on the selection of the value of AlphaTH and the two modes off 
operation. LowMode and HighMode. 

Assume that AlphaTH initially is set to a value AlphalNI. During an l-frame and the first P-frame. this initial value 
will be used to code the shape for every object in those frames. After the encoding stage, the rate control algorithm will 
determine the mode of operation. If the mode of operation is determined to be LowMode. then the system will increment 
the current AlphaTH by AiphalNC. If the mode of operation is determined to be HighMode. then the system will decre- 
ment the cunent AlphaTH by AlphaDEC. The maximum and minimum values of AlphaTH are AlphaMAX and 0. respec- 
tively. This shape rate control algorithm is summarized In Fig. 4. The horizontal axis denotes time. Along this axis are 
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markings which identify a mode of operation (H = HighMode. L = Lowf^ode). The v&tcai axis indicates a corresponding 
AlphaTH at each coding instant. In the example. AfphaMAX is set to 16. and the initial value AlphalNI = 8 (one-half 
MAX). Also. AiphaDEC = 5 and AlphaINC = 3 in the example. In the most general case, AiphaINC and AlphaDEC need 
not be constants, but rather functioris of the cunrent AlphaTH (ag.. larger steps when closer to zero and smaller steps 

5 when closer to AlphaMAX). In a preferred arrangement AlphaMAX = 12. AlphaINC = AlphaDEC = 4 and AlphalNI = 0. 
Note that Rg. 4 emphasizes the actions taken at each coding instant, where each coding instant is unifbrmly 
spaced. In an actual simulation, LowMode is only in operation after the total skipped frames in the previous post-encod- 
ing stage is greater than a selected value of a SKIP_TH, thereby making the time coding instants non-urrifbrm. 

This adaptive selection of AlphaTH based on the mode of operation is quite effective in reducing the number of bits 

10 required for shape while maintaining sufficient quality at very low bit rates. At high bit rates, or sinulations in which Low- 
Mode is less frequent the shape information can be coded using a low AlphaTH, resulting in very high quality object 
boundaries, as expected. This method provkies additional functionality to the wode of operation and complements its 
efforts in regulating th temporal and spatial coding resolutions by freeing up additional texture bits and/or maintaining 
suitable buffer occupancy. 

15 

EXPERIMENTA L RESULTS 

In Table 2 above, the testing conditions for low bit-rate simulations are given; in Table 4. tiie testing conditions for 
high bit-rate simulations are given. In each, an initial quantizatfon parameter of 1 5 was chosen for the l-frame, but there- 
20 after the quarttization parameter was automatically determined. 



Table 4 

High bit-rate testing groups for coding multiple 
video objects* 



ID 


Sequences 


Bit Rate 
(kbps) 


Frame Rate 
(Hz) 


Format 


6 


Akiyo, 
Container 


48 


10 


QCIF 


7 


News 


192 


15 


CIF 



40 



Coastguard 



384 



30 



CIF 



In Table 5, the average PSNR (peak signal to noise ratio) values fbr each VO are given under tiie low-bit rate con- 
ditions. The number of coded frames, tiie average quantization scale witiiin each video object and tiie actual bit rate 
45 achieved are also provided. The same information is provided in Table 6 for the high bit rate simulations. In Figures 6 - 
1 6. plots of tiie buffer occupancy fbr each test sequence illustrate tiie exceptional control exhibited by the method under 
the low bit-rate and high bit-rate conditions. 
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In accordance with the foregoing invention, innprovements on target distribution were made. AlsOi a shape rate con- 
trol mechan^ has been implemented. Simulations of each testing group shew improvements over the previous impie- 
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mentation. The highlights of the proposed joint rate control scheme are: good subjective quality, excellent buffer 
regulation, homogenous quality among VO's. joint control of shape and texture coding and flexible framework to com- 
promise spatial and temporal quality 

The modifications to the target distribution serve to better model the variance within an object. Since the variance 
has traditionaMy been used to indicate the amount of bits needed for coding, the distortion among objects will be more 
consistent 

The adaptive selection off AiphaTH based on the mode of operation is quite effective in reducing the number: of bits 
for shape while maintaining sufficient quality at very low bit rates. At high bit rates, or simulations in which LowMode is 
less frequent, the shape information can be coded using a low AiphaTH. resulting in very high quality object boundaries 
This method provides additional functionality to the mode of operation and complements its efforts in regulating the 
temporal and spatial coding resolutions by freeing up additional texture bits and/or maintaining suitable buffer occu- 
pancy 

Overall, the method is able to accommodate the functionality off the MPEG-4 standard in terms off coding multiple 
video objects for low-latency and Icw-blt-rate applicationa It has also been shown to be scaleaUe to higher bit rate 
applications. 

While the invention has been described in terms of a preferred embodiment, various modifications may be made in 
details off this Implementation without departing from the scope off this invention, which is set forth in the fbnowina 
claims. ^ 



Claims 



1 . A method of adaptively encoding a sequence of frames of image information, wherein at least some of said frames 
contain a plurality of video objects, for providing a compressed video signal to a transmission channel by means of 
a buffer having a variable input bit rate and a substantially constant output bit rate comprising the steps of: 

encoding each of said video objects in each of a set of frames using coding means including a processor for 
performing discrete cosine transform to produce transform coefficients and a quantizer for quantizing the trans- 
form coefficients to generate image-representative code bits at a variable rate, said encoding step producing 
texture, motion and shape information for each said video object; 

storing said image representative code bits in said buffer; 

restricting the contents of said buffer with respect to a predetermined limit value by adjusting quantization 
parameters utilized by said quantizer with respect to a reference value according to a quadratic rate distortion 
model to increase or decrease the amount of code bits generated by said coding drcuit for said video objects 
in successive ones of said frames; 

estimating a target number off bits for encoding each video object in each successive frame in a sequence 
occurring over a predetermined time interval following tiie first frame by distributing a target number of bits for 
all objects in each video object plane among said objects in accordance with a function of relative motion, size 
and variance parameters associated with corresponding objects In the corresponding object plane; and 

setting said variable rate for encoding at one off at least a higher rate and a low/er rate to avoid overflow of said 
buffer while preserving image quality. 

2. The method of daim 1 wherein said function further comprises a separate weighting factor for each of said motion 
size and variance parameters. 

3. The method of daim 1 wherein said variance parameter is derived from calculation off a mean absolute difference 
value for each pixel off a video object in a given object plane as compared to the corresponding pixel in a preceding 
ot)ject plane. 

4. The method of daim 1 wherein said method further comprises the step of skipping the coding off a fframe for a frame 
period whenever the difference between buffer bit capacity and cun-ent bufffer level is less than a predetennined 
margin at tiie end of tfie encoding of all video objects in a frame. 

5. The method of daim 3 wherein said function of relative motion, size and variance parametere indudes a variable 
proportional to the square off said mean absolute cfifference value. 



16 



EP0892 555A2 

6. The method of daim 3 wherein said function is: 

T[i] = • MOT[i] + ' SIZE[il + • MAD^ [i] 

5 

where MOT[i]. SIZEp] and MADp] denote the relative ratios of motion, size and mean absolute difference parame- 
ters and Wm. Ws and Wy are weights which satisfy the expression 

10 W„ + W, + = 1. 



15 7. The method of daim 6 wherein is selected at a lower value and a higher value when said encoding rate is said 
lower rate and said higher rate, respectively. 

& The method of daim 7 wherein said weights are selected as 
^ W„ = = 0.4 and = 0 



for said lower encoding rate and 

W„ = 0.25, = 0.25 and = 0.5 



for said higher encoding rate. 

30 

9. The method of daim 7 wherein said weight Wy = 0 for said lower encoding rate and is greater than or 
for said higher encoding rate. 

10. The method of claim 1 wherein said setting of said varialjle rate is determined by counting a number of consecutive 
35 skipped frames in a time interval immediately preceding said setting step. 

11 . The method of daim 10 wherein said variable rate is set at said higher rate when said number of skipped frames is 
less than a predetermined number. 

40 12. The method of daim 1 1 wherein said predetermined number is two. 

1 3. The method of claim 4 wherein sad variable rate is set at said higher rate when the number of said skipped frames 
is less than a predetermined number. 

45 14. A method of adaptively encoding a sequence of frames of image information, wherein at least some of said frames 
contain a plurality of video objects, for providing a compressed video signal to a transmission channel by means of 
a buffer having a variable input bit rate and a sut)Stantially constant output bit rate comprising the stqps of: 

encoding each of said video objects in each of a set of frames using coding means inducEng a processor for 
50 performing discrete cosine transform to produce transform coefficients and a quantizer for quantizing the trans- 

form coefficients to generate image-representative code bits at a variable rate, said encoding step produdng 
texture, motion and shape information for each said video object: 

storing sakJ image representative code bits in said buffer; 

55 

restricting the contents of said buffer with respect to a predetermined limit value by adjusting quantization 
parameters utilized by said quantizer with respect to a reference value according to a quadratic rate distortion 
model to increase or decrease the amount of code bits generated by said coding drcuit for said video objects 
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in successive ones of said frames; 

estimating a target number of bits for encoding each video object in each successive frame in a sequence 
occurring over a predetermined time interval following the first frame by distributing a target number of bits for 
all objects in each video object plane among said objects In accordance with a function of relative motion size 
and variance parameters assodated with conresponding objects in the con-esponding object plane; and ' 

coding said shape information for each object according to a mask; 

size converting each macroblock of each said object for encoding according to a predetermined conversion 
ratio; 

reconstructing the original size of each said macroblock; 

detennining a conversion error for each pixel block witWn said macro btock; 

comparing said conversion errors to a predetenrrtned threshoW to identify en-or pixel blocks; and 

increasing said conversion ratio and redetermining conversion enters and comparison thereof to said threshold 
20 until said threshold is not exceeded or until a maximum conversion ratio Is reached. 

15- The method of daim 14 and further conprising: 

setting said variable rate for encoding at one of at least a higher rate and a lower rate to avoid overflow of saki 
55 buffer while preserving image quality. 

16. The method of claim 15 wherein said setting of said variable rate is determined by counting a number of consecu- 
tive Skipped frames in a time interval Immediately preceding said setting step. 

30 1 7. The method of daim 16 wherein sakl function of relative motton. size and variance parameters includes a variable 
proportonal to the square of said mean absolute cfifference value. 

18. The method of claim 17 wherein saki function is: 

35 

T[i] - W„ * MOT [i] + W, * SIZEti] + * MAD^ [i] 

jH^-J^w^'i "^^^ *® ""^^ °* ""ean absolute difference parame- 

ters and W„, Wg and W» are weights which satisfy the expression 

40 

W„ + Ws + = 1. 
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19. The method of daim 19 wherein is selected at a iower value and a higher value when said encoding rate Is said 
lower rate and said higher rate, respectively. 

20. A meaiod of adaptively encoding a sequence of frames of image information, wherein at least some of said frames 
contain a plurality of video objects, for providing a compressed video signal to a transmission channel by means of 
a buffer having a variable input bit rate and a substantially constant output bit rate comprising the steps of: 

encoding each of said video objects in each of a set of frames using coding means including a processor for 
peiforming discrete cosine transform to produce transform coefficients and a quantizer for quantizing the trans- 
fom coeffiaents to generate imagen-epresentative code bits at a variable rate, said encoding step producina 
texture motion and shape information for each said video object; 

storing said image representative code bits in said buffer; 
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restricting the contents of said txjffer with respect to a predeternruned limit value by adjusting quantization 
parameters utilized by said quantizer with respect to a reference value according to a quadratic rate distortion 
model to increase or decrease tiie amount of code bits generated by seUd coding circuit for said video objects 
in successive ones of said frames: 

setting said variable rate for encoding at one of at least a high^ rate arxi a lower rate to avoid overflow of said 
buffer while preserving image quality; 

size converting said shape information for each macroblock of each said object according to a predetermined 
conversion ratio; 

determining a conversion error for each pixel block within each said macro block; 

comparing said conversion errors to a predetermined Alpha threshold to identify error pixel tslocks; 

increasing said conversion ratio and redetermining conversion errors and comparison thereof to said Alpha 
threshold until said Alpha threshold is not exceeded or until a maxinmim conversion ratio is reached; 

sMpping the coding of a frame for a frame period whenever the difference t^etween txiff er bit capacity and cur- 
rent buffer level is less than a predetermined margin at the end of the encoding of all video ol^'ects in a frame; 

said setting of said variable rate being determined t>y counting a number of consecutive skipped frames in a 
time interval immediately preceding said setting step, said variable rate being set at said higher rate when said 
number of skipped frames is less than a predetermined number and being set at said lower rate when saki 
number of skipped frames is equal to or greater than said predetermined number; 

after encoding, determining whether said higher rate or said lower rate is operative; and 

increasing said Alpha threshold if said lower rate is operative and decreasing said Alpha threshold if swdi 
higher rate is operative for a succeeding coding interval. 

21. The method of claim 20. wherein: 

said Alpha threshold is set initially at a value substantially midway between zero and a maximum value. 

22. The method of claim 20, wherein: 

said Alpha threshold is increased in value in increments of a first predetermined level and is decreased in value 
in decrements of a second predetermined level. 

23. The method of claim 21 . wherein: 

said increment level is less than said decrement level. 

24. The method of claim 20 and furtiier comprising: 

estimating a target number of bits for encoding each video olDject in each successive frame in a sequence 
occurring over a predetermined time interval following tiie first frame by disb-ibuting a target number of bits for 
an objects in each video object plane among said objects in accordance with a function of relative motion, size 
and variance parameters associated with oonresponding objects in the corresponding object plane. 
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